Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9611 / 000003_owner-urn-ietf _Fri Nov 1 10:12:16 1996.msg < prev next >

Wrap

Internet Message Format | 1997-02-19 | 3KB

Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id KAA27189 for urn-ietf-out; Fri, 1 Nov 1996 10:12:16 -0500 Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id KAA27184 for <urn-ietf@services.bunyip.com>; Fri, 1 Nov 1996 10:12:14 -0500 Received: from windrose.omaha.ne.us by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA05839 (mail destined for urn-ietf@services.bunyip.com); Fri, 1 Nov 96 10:12:11 -0500 Message-Id: <9611011512.AA05839@mocha.bunyip.com> Received: by privateer.windrose.omaha.ne.us; Fri Nov 1 09:11 CST 1996 From: "Ryan Moats" <jayhawk@ds.internic.net> To: "jayhawk@ds.internic.net" <jayhawk@ds.internic.net>, "Martin J Duerst" <mduerst@ifi.unizh.ch> Cc: "urn-ietf@bunyip.com" <urn-ietf@bunyip.com> Date: Fri, 01 Nov 96 09:12:22 Priority: Normal X-Mailer: PMMail 1.52 For OS/2 UNREGISTERED SHAREWARE Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Subject: [URN] %encoding for reserved UTF-8 characters (was: New syntax draft) Sender: owner-urn-ietf@services.bunyip.com Precedence: bulk Reply-To: "Ryan Moats" <jayhawk@ds.internic.net> Errors-To: owner-urn-ietf@bunyip.com Sorry folks, but I need to keep the threads separate to keep my sanity [at least whatever remains of it.... ;-)] On Fri, 1 Nov 1996 12:11:37 +0100 (MET), Martin J Duerst wrote: >When going from ASCII to UTF-8, there are some new problems. >For ASCII, the general assumption is that only those charcters >that need escaping are actually escaped, and therefore that >escaping, e.g. for "/" or whatever, shouldn't be undone. >The above is not exactly true, indeed the "~" is often escaped >as %7E in Europe because it is difficult to type on European >keyboards, but still I assume there are a lot of tools around >working on URLs in general that work along the rule "if ASCII >is escaped, don't remove the escaping, because this is a special >character. >Now for UTF-8, things are quite different. 8-bit bytes will >on many occasions be escaped because it may be difficult to >represent them otherwise. Having some character beyond ASCII >represented with %HH (usually %HH%HH or %HH%HH%HH) can in no >way imply that this is a special character. >This means that any tools dealing with URNs in general will >have no clue about where to keep the escaping, and where >to remove it. A very exact knowledge of each NSS syntax >would be needed. My brain hurts, but I think I finally understand the issue. Allow me to restate it to see if I'm right: The problem you are talking about arrises if the reserved character has a UTF-8 representation of more than 1 octet. Then, if we use %encoding to represent the character in a literal use, there is no way of determining from the URN whether the character is being used as a literal character or not. I'll save a discussion of potential solution directions to this until I'm sure I understand the issue. Ryan